Estimating the strength of unlabeled information during semi-supervised learning
نویسندگان
چکیده
Semi-supervised category learning is when participants make classification judgements while receiving feedback about the right answers on some trials (labeled stimuli) but not others (unlabeled stimuli). Sporadic feedback is common outside the laboratory, and it is important to understand how people learn in this setting. While there are numerous recent studies, the strength and robustness of semi-supervised learning effects remain unclear, particularly when labeled and unlabeled stimuli are dispersed across learning. We designed an experiment, using simple unidimensional category learning, that allows us to measure the relative contribution of labeled and unlabeled experience. Based on an analysis of this task, we find that an unlabeled stimulus is worth more than 40% of a labeled stimulus.
منابع مشابه
Semi-Supervised AUC Optimization without Guessing Labels of Unlabeled Data
Semi-supervised learning, which aims to construct learners that automatically exploit the large amount of unlabeled data in addition to the limited labeled data, has been widely applied in many real-world applications. AUC is a well-known performance measure for a learner, and directly optimizing AUC may result in a better prediction performance. Thus, semi-supervised AUC optimization has drawn...
متن کاملSemi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملEstimate Unlabeled-Data-Distribution for Semi-supervised PU Learning
Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent year...
متن کاملConfidence Estimation for Graph-based Semi-supervised Learning
To select unlabeled example effectively and reduce classification error, confidence estimation for graphbased semi-supervised learning (CEGSL) is proposed. This algorithm combines graph-based semi-supervised learning with collaboration-training. It makes use of structure information of sample to calculate the classification probability of unlabeled example explicitly. With multi-classifiers, th...
متن کاملConstraint-Driven Rank-Based Learning for Information Extraction
Most learning algorithms for factor graphs require complete inference over the dataset or an instance before making an update to the parameters. SampleRank is a rank-based learning framework that alleviates this problem by updating the parameters during inference. Most semi-supervised learning algorithms also rely on the complete inference, i.e. calculating expectations or MAP configurations. W...
متن کامل